Unsupervised Clustering Algorithm Based on Normalized Mahalanobis Distances

نویسنده

  • JENG-MING YIH
چکیده

Some of the well-known fuzzy clustering algorithms are based on Euclidean distance function, which can only be used to detect spherical structural clusters. Gustafson-Kessel clustering algorithm and Gath-Geva clustering algorithm were developed to detect non-spherical structural clusters. However, the former needs added constraint of fuzzy covariance matrix, the later can only be used for the data with multivariate Gaussian distribution. Three improved Fuzzy C-Means algorithm based on different Mahalanobis distance, called FCM-M, FCM-CM and FCM-SM were proposed by our previous works, In this paper, an improved Fuzzy C-Means algorithm based on a Normalized Mahalanobis distance (FCM-NM) by taking a new threshold value and a new convergent process is proposed The experimental results of two real data sets show that our proposed new algorithm has the better performance. Key-Words: Unsupervised Clustering Algorithm, FCM-NM algorithm, GK-algorithm; GG-algorithm; Normalized Mahalanobis Distances 1 Motivation and Preface Fuzzy clustering is a branch in clustering analysis and it is widely used in the pattern recognition field. The well-known ones, such as Bezdek’s Fuzzy C-Means (FCM) and Li et al’s Fuzzy Weighted C-Means (FWCM) [1,2], are based on Euclidean distance. These fuzzy clustering algorithms can only be used to detect the data classes with the same super spherical shapes. . To overcome the drawback due to Euclidean distance, we could try to extend the distance measure to Mahalanobis distance (MD). However, Krishnapuram and Kim (1999) [3] pointed out that the Mahalanobis distance can not be used directly in clustering algorithm. Gustafson-Kessel (GK) clustering algorithm [4] and Gath-Geva (GG) clustering algorithm [5] were developed to detect non-spherical structural clusters. In GK-algorithm, a modified Mahalanobis distance with preserved volume was used. However, the added fuzzy covariance matrices in their distance measure were not directly derived from the objective function. In GG algorithm, the Gaussian distance can only be used for the data with multivariate normal distribution. In our three previous works, to add a regulating factor of Each covariance matrix to each class in the objective function, and deleted the constraint of the determinants of covariance matrices in the GK algorithm, the Fuzzy C-Means algorithm based on adaptive Mahalanobis distances, common Mahalanobis distance and standardized Mahalanobis distance, respectively (FCM-M, FCM-CM, and FCM-SM), [8-12,16] were proposed, and then, the fuzzy covariance matrices in the Mahalanobis distance can be directly derived by minimizing the objective function. In this paper, not only replacing the common covariance matrix with the correlation matrix in the objective function in the FCM-CM algorithm but also replacing the threshold D in equation (43) of FCM-CM algorithm with the determinant value of the crisp correlation matrix,R, and then, a new fuzzy clustering method, called the Fuzzy C-Means algorithm based on normalized Mahalanobis distance (FCM-NM), is proposed. Proceedings of the 9th WSEAS Int. Conference on APPLIED COMPUTER and APPLIED COMPUTATIONAL SCIENCE ISSN: 1790-5117 180 ISBN: 978-960-474-173-1 2 Literature Review Clustering technique plays an important role in data analysis and interpretation. It groups data into clusters so that the data objects within a cluster have high similarity in comparison to one another, but are very dissimilar to those data objects in other clusters. 2.1 GK Algorithm FCM can only work well for spherical shaped clusters. In the objective function the distances between data points to the centers of the clusters are calculated by Euclidian distances. To overcome the above drawback, we could try to extend the distance measure to Mahalanobis distance (MD). However, Krishnapuram and Kim (1999) [2] pointed out that the Mahalanobis distance can not be used directly in clustering algorithm. Gustafson and Kessel (1979) extended the Euclidian distances of the standard FCM by employing an adaptive norm, in order to detect clusters of different geometrical shape without changing the clusters’ sizes in one data set. The objective function of GK algorithm is given in Equation (1),(2),(3) and (4).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

Fuzzy Possibility C-Mean Based on Mahalanobis Distance and Separable Criterion

The well known fuzzy partition clustering algorithms are most based on Euclidean distance function, which can only be used to detect spherical structural clusters. Gustafson-Kessel (GK) clustering algorithm and Gath-Geva (GG) clustering algorithm, were developed to detect non-spherical structural clusters, but both of them based on semi-supervised Mahalanobis distance needed additional prior in...

متن کامل

Normalized Clustering Algorithm Based on Mahalanobis Distance

FCM (fuzzy c-means algorithm) based on Euclidean distance function converges to a local minimum of the objective function, which can only be used to detect spherical structural clusters. The added fuzzy covariance matrices in their distance measure were not directly derived from the objective function. In this paper, an improved Normalized Clustering Algorithm Based on Mahalanobis distance by t...

متن کامل

Extraction and 3D Segmentation of Tumors-Based Unsupervised Clustering Techniques in Medical Images

Introduction The diagnosis and separation of cancerous tumors in medical images require accuracy, experience, and time, and it has always posed itself as a major challenge to the radiologists and physicians. Materials and Methods We Received 290 medical images composed of 120 mammographic images, LJPEG format, scanned in gray-scale with 50 microns size, 110 MRI images including of T1-Wighted, T...

متن کامل

Fuzzy C-Means Algorithm Based on Standard Mahalanobis Distances

Some of the well-known fuzzy clustering algorithms are based on Euclidean distance function, which can only be used to detect spherical structural clusters. Gustafson-Kessel clustering algorithm and Gath-Geva clustering algorithm were developed to detect non-spherical structural clusters. However, the former needs added constraint of fuzzy covariance matrix, the later can only be used for the d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010